Robust Speech-Annotated Photo Retrieval Using Syllable-Transformed Patterns
نویسندگان
چکیده
This study presents a robust indexing and retrieval scheme for digital photos with speech annotations based on the syllable-transformed patterns. In speech retrieval application, out-of-vocabulary and recognition error problems are generally prone to incorrect transcription and therefore degrade the retrieval performance. In this study, the recognized n-best syllable candidates for each syllable is regarded as an ordered pattern and converted into an “image-like” pattern using the multidimensional scaling (MDS) method for indexing and retrieval. Vector quantization is then applied to cluster image vectors into the indexing codeword. Finally, a VSM-based indexing mechanism is used for photo retrieval with speech query. Experiments were conducted on the speech annotations of 1,055 collected digital photos. Compared to other conventional methods, the syllable-transformed pattern method shows a promising improvement on speech-annotated photo retrieval.
منابع مشابه
Development of syllable structure in Azeri-speaking children
Introduction: the length and complexity of syllable structure in the utterances of the children increases with age.Given the important and determining role of syllable in the speech process, performance of developmental studies on syllable acquisition in children are essential. The aim of the present study was to investigate the development and acquisition of syllable structure and the distribu...
متن کاملSyllable timing patterns in Polish: results from annotation mining
Previous studies of duration variation in syllable constituents have yielded results for Polish which are clear outliers in relation to those for other languages. We report on a study of this issue in the context of TTS development, using a large annotated database. Global and local duration distance measures are applied to phoneme and syllable level units, and generalised iambic and trochaic d...
متن کاملA robust/fast spoken term detection method based on a syllable n-gram index with a distance metric
For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use indiv...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملRobust Photo Retrieval using World Semantics
Photos annotated with textual keywords can be thought of as resembling documents, and querying for photos by keywords is akin to the information retrieval done by search engines. A common approach to making IR more robust involves query expansion using a thesaurus or other lexical resource. The chief limitation is that keyword expansions tend to operate on a word level, and expanded keywords ar...
متن کامل